15 research outputs found

    Languages for Big Data analysis

    Get PDF

    Intrusion Detection in Industrial Networks via Data Streaming

    Get PDF
    Given the increasing threat surface of industrial networks due to distributed, Internet-of-Things (IoT) based system architectures, detecting intrusions in\ua0 Industrial IoT (IIoT) systems is all the more important, due to the safety implications of potential threats. The continuously generated data in such systems form both a challenge but also a possibility: data volumes/rates are high and require processing and communication capacity but they contain information useful for system operation and for detection of unwanted situations.In this chapter we explain that\ua0 stream processing (a.k.a. data streaming) is an emerging useful approach both for general applications and for intrusion detection in particular, especially since it can enable data analysis to be carried out in the continuum of edge-fog-cloud distributed architectures of industrial networks, thus reducing communication latency and gradually filtering and aggregating data volumes. We argue that usefulness stems also due to\ua0 facilitating provisioning of agile responses, i.e. due to potentially smaller latency for intrusion detection and hence also improved possibilities for intrusion mitigation. In the chapter we outline architectural features of IIoT networks, potential threats and examples of state-of-the art intrusion detection methodologies. Moreover, we give an overview of how leveraging distributed and parallel execution of streaming applications in industrial setups can influence the possibilities of protecting these systems. In these contexts, we give examples using electricity networks (a.k.a. Smart Grid systems).We conclude that future industrial networks, especially their Intrusion Detection Systems (IDSs), should take advantage of data streaming concept by decoupling semantics from the deployment

    Why High-Performance Modelling and Simulation for Big Data Applications Matters

    Get PDF
    Modelling and Simulation (M&S) offer adequate abstractions to manage the complexity of analysing big data in scientific and engineering domains. Unfortunately, big data problems are often not easily amenable to efficient and effective use of High Performance Computing (HPC) facilities and technologies. Furthermore, M&S communities typically lack the detailed expertise required to exploit the full potential of HPC solutions while HPC specialists may not be fully aware of specific modelling and simulation requirements and applications. The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications has created a strategic framework to foster interaction between M&S experts from various application domains on the one hand and HPC experts on the other hand to develop effective solutions for big data applications. One of the tangible outcomes of the COST Action is a collection of case studies from various computing domains. Each case study brought together both HPC and M&S experts, giving witness of the effective cross-pollination facilitated by the COST Action. In this introductory article we argue why joining forces between M&S and HPC communities is both timely in the big data era and crucial for success in many application domains. Moreover, we provide an overview on the state of the art in the various research areas concerned

    Requirements Engineering for Large-Scale Big Data Applications

    No full text
    As the use of smartphone proliferates, and human interaction through social media is intensified around the globe, the amount of data available to process is greater than ever before. As consequence, the design and implementation of systems capable of handling such vast amounts of data in acceptable timescales has moved to the forefront of academic and industry-based research. This research represents a unique contribution to the field of software engineering for Big Data in the form of an investigation of the big data architectures of three well-known real-world companies: Facebook, Twitter and Netflix. The purpose of this investigation is to gather significant non-functional requirements for real-world big data systems, with an aim to addressing these requirements in the design of our own unique reference architecture for big data processing in the cloud: MC-BDP (Multi-Cloud Big Data Processing). MC-BDP represents an evolution of the PaaS-BDP (Platform as a Service for Big Data Processing) architectural pattern, previously developed by the authors. However, its presentation is not within the scope of this study. The scope of this comparative study is limited to the examination of academic papers, technical blogs, presentations, source code and documentation officially published by the companies under investigation. Ten non-functional requirements are identified and discussed in the context of these companies’ architectures: batch data, stream data, late and out-of-order data, processing guarantees, integration and extensibility, distribution and scalability, cloud support and elasticity, fault tolerance, flow control and flexibility and technology agnosticism. They are followed by the conclusion and considerations for future work

    ETL

    No full text

    Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

    No full text
    International audienceReal-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. In an ongoing , industrial project, a 24/7 available stream processing engine usually faces dynamically changing data and workload characteristics. These changes impact the engine's performance and reliability. We propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault tolerance, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. We highlight the efficiency (e.g., on a single machine machine, up to 60x gain on throughput compared to state-of-the-art systems, a throughput of 3.1 million triples/second on a 9 machines cluster, a major breakthrough in this system's category) of Strider on real-world and synthetic data sets

    D 2 IA: Stream Analytics on User-Defined Event Intervals

    No full text
    Nowadays, modern Big Stream Processing Solutions (e.g. Spark, Flink) are working towards ultimate frameworks for streaming analytics. In order to achieve this goal, they started to offer extensions of SQL that incorporate stream-oriented primitives such as windowing and Complex Event Processing (CEP). The former enables stateful computation on infinite sequences of data items while the latter focuses on the detection of events pattern. In most of the cases, data items and events are considered instantaneous, i.e., they are single time points in a discrete temporal domain. Nevertheless, a point-based time semantics does not satisfy the requirements of a number of use-cases. For instance, it is not possible to detect the interval during which the temperature increases until the temperature begins to decrease, nor all the relations this interval subsumes. To tackle this challenge, we present D2IA; a set of novel abstract operators to define analytics on user-defined event intervals based on raw events and to efficiently reason about temporal relationships between intervals and/or point events. We realize the implementation of the concepts of D2IA on top of Esper, a centralized stream processing system, and Flink, a distributed stream processing engine for big data

    A Benchmark Generator for Online First-Order Monitoring

    No full text
    We present a randomized benchmark generator for attesting the correctness and performance of online first-order monitors. The benchmark generator consists of three components: a stream generator, a stream replayer, and a monitoring oracle. The stream generator produces random event streams that conform to user-defined characteristics such as event frequencies and distributions of the events’ parameters. The stream replayer reproduces event streams in real time at a user-defined velocity. By varying the stream characteristics and velocity, one can analyze their impact on the monitor’s performance. The monitoring oracle provides the expected result of monitoring the generated streams against metric first-order regular specifications. The specification languages supported by most existing monitors are either a subset of or share a large common fragment with the oracle’s language. Thus, we envision that our benchmark generator will be used as a standard correctness and performance testing tool for online monitors.ISSN:0302-9743ISSN:1611-334
    corecore